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ABSTRACT 



This paper describes the General Information Processing 
System, a computer program designed to serve a variety of 

s 

information processing applications. An input deck to the 
program is composed of a data base and a description of the 
processing tasks to be performed on that data. A typical 
task would be to screen the data base according to given 
criteria and then output information from the data that met 
the criteria. For output, the system has flexible format- 
ting capabilities. 

Included in the paper, in Section II, is an example of 
a bibliographical application, complete with a listing of 
the input deck and the output that was produced. Sections 
III through VII contain detailed instructions on how to 
prepare an input deck. A description of system implemen- 
tation is contained in Section VIII. 
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I. INTRODUCTION 



Intelligent decision making is predicated on the avail- 
ability of certain requisite information. It is this 
"availability of requisite information" that has generated 
a great deal of interest in an area of computer usage 
called information retrieval. Information retrieval systems 
operate on a data base that may be either static or dynamic. 
However, they usually conform to some prescribed organiza- 
tional criteria. It is this organization of the data that 
allows the system to access the desired information. 

Most bodies of information (data files) can be organized 
into logical entities (data records). For each entity (data 
record) there are a number of attributes (record entries), 
the values of which describe that entity. For example, a 
bibliography is a body of information (data file) in which 
publications (data records) are described by the attributes, 
author, title, subject, etc., (record entries). For another 
example, a membership file (data file) is a body of infor- 
mation which describes persons (data records) by name, 
address, profession, etc., (record entries). Mailing lists, 
patient files, physical assets inventories, and many other 
bodies of information could readily be cited here, also. 

Within the realm of information retrieval are Management 
Information Systems, Command and Control Systems, and Air- 
line Reservation Systems, just to mention some of the more 
significant applications. Due to the magnitude and 
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complexity of these applications special purpose, computer 
programs are written for them so that they can be made to 
operate more efficiently. There is, however, a large class 
of applications which cannot Justify the expense of a 
special purpose system but still require the capability of 
information retrieval. To meet this need a group of 
General Information Retrieval Systems have emerged. These 
systems are designed to be used for a number of different 
general applications. 

References 5, 8, 9, and 10 are fairly detailed descrip- 
tons of General Information Retrieval Systems. Reference 
7 is an excellent paper devoted to this subject and 
includes discussion of 19 such systems. Reference 1 des- 
cribes the BASIC INFORMATION PROCESSOR which was the fore- 
runner of the system presented in this paper. 

The General Information Processing System (GIPSY) 
described here is designed to perform this type of non- 
arithmetic processing. It was intended to be flexible 
enough that it might be used for numerous applications and 
simple enough for user convenience. The system was imple- 
mented in PL/1 [Refs. 2, 3, and 4] since its capabilities 
along these lines simplified the programming chore. 

This paper will begin with an illustrative example of 
GIPSY applied to a bibliography, to give the reader an idea 
of how it is used and what it can do. Then the details 
about the input deck, the data to be processed, the output 
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format , screening, and other control cards will be discussed. 
The manner in which the system was implemented will be 
described next, followed by a few brief concluding remarks. 
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II. A BIBLIOGRAPHY EXAMPLE 



The application which motivated the development of GIPSY 
was the maintenance of a bibliography for research. This 
bibliography contained several hundred references. However, 
three were chosen as being typical of the references in 
the bibliography. These three references were used in the 
development and testing of GIPSY and, therefore, were con- 
venient to use in the following example. 

A. THE INPUT DATA 

In this example, the data base consists of the entire 
deck of cards used as data for GIPSY. The data file is made 
up of the data describing these three references, thus each 
reference is one record in the data file. Each record is, 
in turn, described by twelve entries, the twelfth entry 
is a remarks section and could contain as many as seven 
paragraphs. Figure 1 contains a listing of the input deck 
used for this example. 

A glance at Figure 1 reveals the general make-up of the 
input deck. The cards that have a punched in column 

one are control cards which cause GIPSY to branch to that 
section of the program which is to process the cards that 
follow until the next control card is encountered. All 
cards are begun in column one and any continuation cards 
are begun in column six. The '/’ (slash) character is a 
reserved, special character, and as such is used by GIPSY 
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in a variety of ways. Though not readily apparent at this 
point, the input deck can be logically divided into four 

groupings . 

The first such group begins with the #SETUP card and 
ends with the first appearance of the *EOF card. This group 
provides GIPSY with the necessary information to set up 
the original data structure. The #SETUP card and the three 
cards that follow provide an estimate of the total number 
of cards that make up the input deck, an estimate of the 
total number of records in the data file, the maximum 
number of entries in any one record, and names to be 
assigned to each entry. For example, entry number one is 
to be named "AUTHOR," entry number two is to be named 
"TITLE," etc. . 

The #DATA card and all the cards that follow down to 
and including the *EOF card complete this first group. 
Between these two control cards are the actual reference 
descriptions that make up the data file. The data punched 
in columns 77-80 is not included in the data record, but 
is treated as a special program variable and can be used 
in a variety of ways, in this example it was used to contain 
an identification number for each record. Now that the data 
structure has been set up, the remaining groupings describe 
information processing tasks to be performed by GIPSY. 

B. PROCESSING THE DATA 

The first of these task descriptions includes the cards 
down to and including the first #PRINT card. The *F0RMAT 



14 



card marks the beginning of a description of the format to 
be used whenever any information is to be output and the *EOF 
card marks the end of the format specification. The *PRINT 
card tells GIPSY to print all the records in the data file 
according to the most recent format specified. Figures 
2, 3, and 4 are copies of the printouts obtained when this 
section of the program was executed. 

The next logical grouping, which continues down to and 
including the next *PRINT card, is similar to the previous 
group. It also requires that all records be printed; how- 
ever, this time the format was changed. Figure 5 is a 
copy of the results of the execution of this section of the 
program. 

All the remaining cards in the deck except the last 
one, #ENDJOB, make up the last group. However, this last 
group can be further divided into three subsections predi- 
cated on three related tasks. Before these tasks begin, 
the ^CHANGE card and the card immediately following cause 
the program variable named "COUNTER," which counts the 
number of records that have been printed, to be reset to 0. 

The screen card marks the beginning of the first task 
and indicates that the next card contains the screening 
criteria to be used whenever a screen is indicated. In this 
case, it would look for the first occurrrence of the charac- 
ter string 'SIM' anywhere in the entry named "DESCRIPTORS." 
This screening criteria is to remain in effect until changed 
by a new ^SCREEN card or until the program is terminated. 
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TITLE PROBLEMS IN THE STATISTICAL ANALYSIS OF SIMULATION EXPERIMENTS 

- T H5 COMPARISON OF MEANS AND THE LENGTH OF SAMPLE RECORDS 
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Next the ^HEADING card causes a new page to be ejected by 
the printer and the heading "SIMULATION REFERENCES" to 
be printed starting in column 49 of the twelfth line down 
from the top of the page. Next the ^FORMAT card marks the 
beginning of new format specifications for subsequent 
printings. Finally the *PRINT SCREEN card tells GIPSY to 
print only those records that satisfy the prevailing 
screening criteria and print them according to the latest 
format information provided. 

The second task begins with a new ^HEADING card which 
causes 10 lines on the current page to be skipped and the 
text "SIMULATION REFERENCES NOT LOCATED AT GEH" to be 
printed beginning in column 38. Next, new screening 
criteria is encountered which replaces the previous one. 

Once again a *PRINT SCREEN card causes the records to be 
subjected to the prevailing screen and the ones that satisfy 
it to be printed. Note that a new format was not specified, 
therefore, the records that passed this latest screen will 
be printed in exactly the same format as were the ones in 
the previous section. 

The last task in this group, the #PRINTCOUNTER card, 
causes 10 lines on the current page to be skipped and the 
number of records that satisfactorily passed both of the 
screens to be printed starting in column 28 followed 
immediately by the text "REFERENCES MET SCREENING CRITERIA." 
Figure 6 is a copy of the results of the execution of these 
three tasks. 
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REFERENCES MET SCREENING CRI^ERU 



Finally, the last card in the deck, *ENDJOB, causes 
the printer to eject a new page and print the message "END 
OF JOB" across the top of the new page. 

The total amount of computer time required to produce 
Figures 2 through 6, utilizing the data deck as it appears 
in Figure 1, was 1.5 seconds. A certain amount of this 
time is fixed overhead, and computer time would not increase 
proportionally as the data base and/or the number of tasks 
are increased. For example, when the data file was 
expanded to 222 references and was processed according 
to these same requirements, the computer time expended was 
73.6 seconds. 

C. THE BIBLIOGRAPHY DATA FORM 

Figure 7 is an example of a form that was used in the 
preparation of the data for the bibliography mentioned 
above. Once the user has decided upon his file organization 
a form similar to Figure 7 can be reproduced inexpensively. 
This form can then be used during data accumulation and 
can then be used to keypunch the data onto cards. 
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Figure 1. 
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III. THE INPUT DECK 



As can be seen from the example In Figure 1, an Input 
deck for GIPSY consists of numerous control cards (with a 
star in column 1) interspersed with the data to be processed 
and other information. The typical order of input is to 
name the entries and then furnish the data to be processed. 
Next, specify a format and maybe a screen and then produce 
some output. Then specify either a new format or a new 
screen or both and then produce more output. This may be 
done any number of times. Also, cards may be included to 
print headings and to control various program variables. 

It is also possible to enter a new batch of data, with or 
without renaming the entries. 

A. OS/360 REQUIREMENTS 

To process an input deck to GIPSY on the computer at 
the Naval Postgraduate School, various OS/360 control cards 
are required. If the program is to be run utilizing a 
source deck, then the following are the control cards that 
are needed: 

// (User's Green Job Card) 

// EXEC PL1LFCLG, REGION. GOxxxK, TIME. GO=xx 
//PL1L.SYSIN DD * 

(Source Deck) 

//GO.SYSIN DD * 

(User's Input Deck) 

/* (Salmon Colored) 
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To process an input deck utilizing a source deck 
requires that the program be compiled and link edited each 
time that it is run. These two steps require approximately 
50 seconds of computer time. To save this unnecessary use 
of computer time and also to spare the user the inconvenience 
of repeated handling of a large source deck, GIPSY has been 
compiled, linked edited, and stored, as an object module, 
on the IBM 231^ disk named "MARY." As a result, the only 
computer time required is that needed to process the input 
deck and the only cards required are those that make up an 
input deck to GIPSY. The following are the control cards 
needed to process an input deck utilizing the program 
stored on the disk: 

// (USERS GREEN JOB CARD) 

//JOBLIB DD DSN=F0849- GIPSY, VOL=SER=MARY,UNIT=231^,DISP=OLD 

// EXEC PGM=GIPSY , REGION=xxxK , TIME=xx 

//SYSPRINT DD SYS0UT=A ,SPACE= ( TRK , ( xxx , xxx) ,RLSE ) , 

// DCB= ( RECFM=FB , LRECL= 133 ,BLKSI ZE= 3325) 

//SYSIN DD * 

(USERS INPUT DECK) 

/* (SALMON COLORED) 

The x's in the above cards represent the time and storage 
requirements of the program and the input deck. These 
parameters will vary depending on the application. The 
region size may be approximated as follows: GIPSY=150K plus 

32K for each allocation of WORD(i). (See sections III-B and 
VIII-B for an explanation of WORD.) 
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B. SETTING UP THE PROGRAM 



The first part of the input deck sort of "sets the 
stage" by providing GIPSY with certain information neces- 
sary to set up and get ready to process the data. Thus, 
it is mandatory that the first card in the input deck be 
a card with #SETUP punched in columns 1-6 (unless just 
the listing described in Section III-D is desired). 

There are eight program variables used to allocate 
storage and to dimension other variables. The user furnishes 
GIPSY with a rough description of his data in order that 
efficient use of storage can be effected. The following is 
a description of these variables. 

CARDS - This is to be an estimate of the total number 
of cards in the input deck. Default value is 
480. 

COLS - This is an estimate of the average number of 
columns used per card to furnish the data in 
the input deck. Default value is 70. This 
variable is multiplied by CARDS in the program 
to arrive at the total number of bytes to 
allocate to the program. Thus, this sets an 
upper limit on the size of the user’s data 
deck . 

ESTIMATE - This variable is used for dimensioning pur- 
poses and is an estimate of the number of data 
records in the data file. Default value is 100. 
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This variable sets an upper limit on the number 
of records in a data file. 

ENTITY - The value assigned to this variable is the 
actual number of entries that make up a data 
record. Default value is 10. 

NOLINES - This gives the maximum number of logical 

lines to be described in any format specifi- 
cation. Default value is the value of ENTITY; 
(see Section V ) . 

IDCOL - The value assigned to this variable is the 
column number in which the special field on 
the input cards is to begin. This variable is 
further described in Sections IV-B. Default 
falue is 77. 

SIZE - This variable specifies the field width desired 
to output the value of COUNTER. Default value 
is 3- COUNTER is always an integer, right 
justified in this field. 

COLOR - This variable specifies the number of times an 
item is to be printed when over-printing is 
specified in an output format. Default value 
is 3- 

The value assigned to each of these variables must be a 
decimal integer. The card containing these variables and 
their values must follow the #SETUP card. The format for 
this card is: VARIABLE=VALUE , VARIABLE=VALUE , etc.; and 

may be continued on as many cards as is necessary, with 
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the variables and their values being separated by commas 
and a semicolon terminating the list. If no values are 
assigned and default values are desired then a card with 
a semicolon punched anywhere must be included at this point 
in the input deck. 

C. NAMING THE ENTRIES 

The final requirement of the #SETUP card is that names 
must be given to each of the entries that describe a data 
record (author, title, publication, etc. in the bibliography 
example). These names may then be used to refer to entries 
when specifying a screen, outputting information, etc.. 

Entry names are restricted to being alphanumeric character 
strings of length less than or equal to 50 characters. 

That is, names cannot contain blanks or special characters, 
and the first character of the name must be an alphabetical 
character. A further restriction is that entry names may 
not be any of the following: IN, AT, GT, LT, GE, LE, EQ, 

NE, OR, AND, NOT. Entry names are punched with the first 
name beginning in column 1 of the first card, and are 
separated by slashes. Note that blanks may not appear 
between slashes as they would be interpreted as being part 
of that character string and blanks are not permissable 
characters . 

No characters may be punched in or after the column 
specified by IDCOL on the previous card. If the list of 
names will not fit on one card, as many continuation cards 
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may be used as Is necessary. The list may be broken 
between any two entry names, immediately after the slash. 

The first character of the next name begins in column 6 
of the next card, and so on until the list has been com- 
pleted. These cards are to immediately follow the cards 
described above, in the input deck. The reader is referred 
to Figure 1 for an example of how the *SETUP group of 
cards should look. 

D. INPUT DECK LISTING 

GIPSY has a limited capability of providing the user 
with a listing of his input deck. To cause a listing to 
be printed, a card may be inserted with *LIST punched in 
columns 1-5 between the JCL cards and the first card in 
the input deck. The result will be that all cards in 
the input stream, beginning with the one immediately 
following the *LIST card and ending with the one immediately 
in front of the salmon colored /* card, will be printed. 

The program will then terminate normally, without any 
further processing. 
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IV. THE DATA TO BE PROCESSED 



The data to be processed would normally be the greatest 
part of the input deck. The only limit on the number of 
records that can be processed is the amount of core storage 
available to the user. Each data record consists of values 
for the entries named in the *SETUP section. The data 
would normally follow the *SETUP group and is preceded by 
a control card with *DATA punched in columns 1-5. Another 
control card with *EOF punched in columns 1-4 is required 
to mark the end of the data file. 

A. THE DATA CARDS 

A data record is punched on a set of cards. Since the 
only limitation on the size of a record is that its total 
length be less than 3000 characters, as many cards as are 
needed may be used, with the data beginning in column 1 of 
the first card and in column 6 of all others. Data should 
not be punched in or after the column specified by IDCOL 
(to be discussed later in this section). 

Entries are separated by slashes. Consequently, all 
characters appearing between two slashes are taken as being 
constituents of that entry, the only exception being when 
breaking an entry for continuation on the next card. If 
the last character punched on the card is a slash, then no 
blanks are assumed and whatever appears in column 6 of the 
next card is treated as the first character of the next 
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entry. If, on the other hand, the break occurs in the 
middle of an entry, one blank is assumed between the last 
non-blank character on the card and the character in 
column 6 of the next card. 

There must be a one-to-one correspondence between the 
entries named in the *SETUP section and the entries in a 
data record. To maintain this correspondence, when a record 
contains a null entry, a slash is punched in the next 
column following the slash that marks the end of the previous 
entry. If these null entries occur at the end of the 
data record then the slashes may be omitted since GIPSY will 
automatically regard them as null. 

If the first two entries in a data record were null, 
then slashes would appear in columns 1 and 2. However, this 
would result in the computer interpreting that card as a 
OS/360 JCL card, and the program would terminate. One 
way to avoid this problem is to assign a blank to either 
one of these entries. 

B. THE "IDENT" FIELD 

Column IDCOL to column 80 on the first card of every 
data record is handled specially. The contents of this 
field are stored in a character variable of length 
(80-IDCOL) + 1 and may be accessed by the user through 
the variable IDENT. In the bibliography example, this 
field was used to contain an identification number for each 
record and IDCOL had the value 77 (by default). Of course, 



31 



as this 
columns 



field becomes larger it reduces the number of 
that may be used for data on every card. 
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V. THE OUTPUT FORMAT 



The format of the output from GIPSY is very flexible, 
but its specification is the most difficult part of the 
input deck to prepare. Fortunately, there usually would be 
only a few standard formats of interest in any particular 
application of the system, and these would have to be made 
up only once. Before proceeding, the difference between a 
"locical" line of output and a "physical" line of output 
should be understood. A "physical" line consists of a 
maximum of 132 characters and is one actual line produced 
by the line printer. A "logical" line is a line as speci- 
fied by the user and may contain so many characters that 
it requires several "physical" lines to actually print it. 

A complete format specification describes one or more 
logical lines, and the description of each logical line is 
contained on two or more punched cards. A control card 
with ^FORMAT punched in columns 1-7 marks the beginning 
of a format specification, and a control card with *EOF 
punched in columns 1-4 marks the end. 

The specif ication of a logical line can be thought of 
as having two components, the form of the line and the 
content of the line. The form of the line is required for 
every logical line and is described on two cards hereafter 
referred to as form cards. The content of the line is 
described on one or more cards hereafter referred to as 
content cards . 
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A. FORM OF THE LINE 



It is convenient for the discussion if the two form 
cards are envisioned as being placed end-to-end, next to 
each other so that they make up a field of 160 columns. 

The first 133 columns of this field (all of the first card 
and columns 1-53 of the second) then represent the 133 
print positions on the line printer. The remaining 27 
columns are divided into numerous small fields that will 
be described in Section B below. 

A slash must always be punched in column one of the 
first form card in each logical line, the remaining 132 
columns are then available to the user. These 132 columns 
represent print positions and may be filled by three types 
of characters: blanks, text, and variables. Text is writ- 

ten into the print positions where it is desired to have it 
appear. Slashes may not be used in the text since they are 
used to indicate the first print position for a variable 
which is defined on the associated content cards. Blanks 
are used as text to be printed or as spacers for the 
variables . 

Since a slash marks the print position for the first 
character of the variable, it must be followed by a suf- 
ficient number of blanks to allow the placement of the 
value of the variable in that space, before the occurrence 
of the next slash or the next character of text. If the 
variable called for is too long to fit on the remainder 
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of the physical line, continuation is provided for as will 
be described shortly. 



B. FORMAT PARAMETER FIELDS 

The remaining 27 columns of the second form card for 
each logical line must contain the following information : 
Column 55: A slash in this column indicates that there 

is no content card associated with this log- 
ical line. A blank in this column indicates 
that the next card in the input deck is the 
content card that defines the variables pro- 
vided for on these form cards. 

Column 56: A s lash in this column indicates that when 

a variable called for in this logical line 
evaluates to the null string, printing of 
all text since the processing of the previous 
variable and any associated text on the 
content card is to be suppressed. A blank 
indicates that there is to be no suppression 
of any printing in this logical line. 

Column 57: A slash indicates that each time this log- 

ical line is processed the resulting print- 
ing is to begin on a new page. A blank 
indicates that continuation on the same page 
is desired. 

Column 59: A slash indicates that this logical line is 

a page heading and is to be printed, as des- 
cribed, each time a new page is ejected. 
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Column 



Column 



Columns 



Columns 



This logical line must be less than or equal 
to one physical line and variables cannot be 
included. As many page heading lines may 
be indicated as is desired. A blank indi- 
cates that this logical line is not a page 
heading. 

1: A slash indicates that all text appearing 

in this logical line is to be over-printed. 
The result is that the text will be in much 
darker print than the variables. A blank 
indicates that no over-printing is desired. 

3-64: A decimal integer, right justified, must 

be supplied which indicates the number of 
blank lines that are to be inserted between 
the last line of the previous logical line 
printed (or the top of the page) and the 
first line printed from this logical line. 
This field cannot be left blank. 

69-70: A decimal integer, right justified, must 

be supplied which indicates the last physi- 
cal line on the page on which printing from 
this logical line may take place, and after 
which no printing is to take place. There 
can be a maximum of 59 physical lines on a 
page. This field cannot be left blank. 

72-74: A decimal integer, right justified, must 

be supplied which indicates the column in 
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which printing is to begin if additional 
physical lines are required to contain this 
logical line (indentation for continuation 
lines). This field cannot be left blank. 

Columns 76-78: A decimal integer, right justified, 

which indicates the column number after 
which no printing is to take place for all 
physical lines in this logical line. This 
field cannot be left blank. 

Column 80: A slash indicates that this logical line 

is to be used only once, the first time the 
printing section is entered (for example, 
a title page on a report). This logical 
line must be less than or equal to one 
physical line and variables cannot be 
included. As many logical lines of this 
type may be specified as is desired. A 
blank indicates that this logical line is 
not to be used in this manner. 

C. CONTENT OP THE LINE 

The content cards, associated with a logical line and 
its associated form cards, are used to supply the variables 
that are to be processed with the logical line. A content 
card may be thought of as being divided into fields. Within 
each field there may appear literals, variables, and/or 
conditional expressions. To each slash on the form card 
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(in a print position) there must correspond a field on the 
content card. However, any number of fields may be con- 
catenated to form a new field. 

The basic ingredient in any field is a variable which 
may be any entry name or one of the program variables: 
IDENT, COUNTER, or PAGENO. Entry names and IDENT have been 
previously discussed, COUNTER and PAGENO will be defined 
at this point before continuing with the description of 
the fields. 

In GIPSY there is a variable named COUNTER which is 
incremented by one each time a data record is processed 
for printing. COUNTER is set to zero each time the *SETUP 
and the *PRINTCOUNTER control cards are processed. COUNTER 
may be set to any value by the user utilizing the ^CHANGE 
control card (see Section VII-A). COUNTER can be used in 
two ways: to count the number of records that success- 

fully pass a given screen and/or to number the records as 
they are printed. PAGENO is the GIPSY variable that keeps 
track of the page numbers of pages that are printed. PAGENO 
is initialized each time the program processes the *PRINT 
control card. All pages printed are counted, except the 
pages that are used in a manner similar to a title page (a 
slash in column 80 of the second format card). Pages are 
counted even though page numbering is not called for. 

Literals may be defined as any character that appears 
in the field and it is not one of the variables defined 
above or it is not a conditional expression. It is 
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especially efficient if the literals are enclosed in single 
quotes. In the case of alphanumeric literal strings, it 
is imperative that they be enclosed in single quotes. 
Example : 



Literals 
xj, 4/ 4 

COUNTER./ "TITLE"//', NUMBER OF PAGES = 'PAGES/ 

^ 

Variables 

If the fourth record was being processed, and its title was 
"SIMULATION" and it contained 3** pages, then the following 
output would be produced: 4. "SIMULATION," NUMBER OF 

PAGES = 34. 

A conditional expression is enclosed in angular brackets 
and, in general terms, says to print certain information 
only if a certain condition is satisfied. Inside the angu- 
lar brackets the information to be printed is separated 
from the condition by a slash. Literals and variables, as 
defined above, may appear between the left angular bracket 
arid the slash. They will be printed only if the condition 
defined to the right of the slash is satisfied. Immediately 
following the slash must appear either a "+" or a 
followed immediately by a variable name, as defined above, 
followed immediately by the right angular bracket. The "+" 
is interpreted as "if the following variable is not null," 
the is interpreted as "if the following variable is 

null." Roughly speaking, a "+" says if something is in 



39 



the variable and the says if nothing is in the variable. 

Example : 



entry names 
i l 

a. < PUBLIC ATI ON/-TITLE> 

This example says to print the contents of the entry named 
"PUBLICATION" if the contents of the entry named "TITLE" is 
the null string. 

literals entry name 

4 l 

b. <» PAGES = '/+PAGES> 

This example says to print the character string 'PAGES 
if the contents of the entry named "PAGES" is not the null 
string. 

Fields are separated by slashes. Two fields may be 
concatenated by separating them by two slashes instead of 
one slash. This merges the two fields into one field 
which has a corresponding slash in a print position on 
the associated form card. Content cards are punched with 
the first field beginning in column one of the first content 
card and must end before column IDCOL. If the content line 
is too long to fit on one card, it may be continued by 
breaking the string immediately after a slash and punching 
the next character in column six of the next card. This 
process is repeated until the entire content line has been 
specified. Example: 



COUNTER./ AUTHOR,// "TITLE,"// PUBLICATION,// PUBLISHER// 
< (/-PUBLISHER>DATE<. /+PUBLISHER>< ' ) , 'PAGES./ -PUBLISHER >/ 



This example says to print the value of the counter, 
followed by a period, in the column indicated by the first 
slash on the form card. Then in the next print position 
marked by a slash on the form card, print the author's 
name followed by the title and a comma, all in double 
quotes, concatenated with a blank followed by the name of 
the publication and a comma, concatenated with a blank 
followed by the publisher and a comma. Next, the date is 
to be concatenated. If the publisher is null, then the 
date is to be parenthesized followed by a comma, a blank, 
the number of pages and a period. If the publisher is not 
null, then the date is simply to be followed by a period. 
Figure 5 is the output produced by this example. 

D. ORGANIZATION OF THE FORMAT SPECIFICATIONS 

The organization of the format specifications requires 
that any form cards that are to specify a title page must 
be the first cards immediately following the ^FORMAT card. 
Next must come any form cards that specify a page heading. 
All cards that come after this point are cycled through for 
each record processed for printing. Therefore, next would 
come the first and second form cards and then the content 
card for the first logical line. Next would come the cards 
for the second logical line, and so on until the entire 
format has been specified. The only limit on the number of 
logical lines in a format specification is the value 
assigned to the variable NOLINES in the *SETUP section. 



If page numbering is desired then its format must be 
provided for by the user. The variable PAGENO described 
above is indicated on the associated content card. Any 
text may appear on the form cards. However, PAGENO is 
the only variable that can be specified, and the length of 
the logical line must be less than or equal to one physical 
line. Page numbering can only be accomplished as the last 
printing to take place on a page. The decimal integer 
specified in columns 63 and 64 on the second format card 
of a page numbering specification is interpreted as the 
number of the line on the page that the page number is to 
appear. This number must be less than or equal to the 
number in columns 69 and 70. The page number logical line 
may occur anywhere in the format specification. 
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VI. SCREENING 



A screen in GIPSY is a boolean expression which states 
a condition that a data record must satisfy in order to 
be included in the output. This feature is the key to the 
usefulness of a system like this, in that it makes it easy 
to obtain listings of various cross-sections of the data 
being processed, such as all references on a specific sub- 
ject in a bibliography or all members with a certain 
attribute in a membership list. A screen specification 
consists of a ^SCREEN card followed by one or more cards 
with a boolean expression. 

A. SIMPLE BOOLEAN EXPRESSIONS 

A simple boolean expression is of the form: 'character 

string' relational operator entry name. The character 
string can be any group of characters, including blanks but 
not including single quotes. It is the character string 
that will be tested for according to the type of relation. 
The entry name can be any of those given in the *SETUP 
section or an abbreviation of one, made up from its leading 
characters. If an abbreviation is used, it must be long 
enough to distinguish it from any similar name that appears 
before it in the list of names. The relational operators 
and their description are as follows: 

IN - The IN expression has the value true when the 

character string occurs anywhere in the designated 
entry of a data record. 
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AT - The AT expression has the value true only when 
the character string occurs at the beginning of 
the designated entry. 

GT - The GT expression has the value true when the 
character string yields a result of strict ly 
greater than when compared character by character 
with the entry designated. 

LT - The LT expression has the value true when the 

character string yields a result of strictly less 
than when compared character by character with 
the entry designated. 

GE - The GE expression has the value true when the 

character string yields a result of greater than 
or equal when compared character by character with 
the entry designated. 

LE - The LE expression has the value true when the 

character string yields a result of less than or 
equal when compared character by character with 
the entry designated. 

EQ - The EQ expression has the value true when the 
character string yields a result of equal when 
compared character by character with the entry 
designated . 

NE - The NE expression has the value true when the 

character string yields the result of unequal when 
compared character by character with the entry 
designated . 






The following are examples of simple boolean expressions: 
'SMITH' AT AUTHOR 
'SIM' IN TITLE 
'1966' LE DATE 

B. COMPLEX BOOLEAN EXPRESSIONS 

The logical operations AND, OR, and NOT may be used to 
form "complex boolean expressions," in the usual fashion. 

The normal hierarchy of boolean operations is assumed, NOT 
has the highest precedence, AND is next highest, and OR has 
the lowest precedence. Parentheses may be used to any level 
desired for grouping or factoring of the expression, the 
only requirement being that parentheses must be balanced. 

For example, the expression A AND B OR NOT C AND D would 
be evaluated as follows: ((A AND B) OR ((NOT C) AND D)). 

If adjacent simple expressions have the same "relational 
operator entry name" part, it need be given only once with 
the last one. Whenever a relational operator appears, it 
applies to all character strings occurring since the 
previous relational operator, without regard for parentheses. 
The following is an example of a boolean expression which 
might be used with the bibliography in the example: 

('SIM PROG' OR 'LIST PROC ' ) AND 'LANG COMP' IN DESCRIPTORS 
AND (' 196 V OR '1965' OR '1966' IN DATE) AND NOT 
('GEH' OR ' YCC ' ) AT LOCATION 

Only those references dealing with the comparison of simula- 
tion or list processing languages, published between 1964 






and 1966, and not located at GEH or YCC would satisfy this 
screening condition. 

C. THE BOOLEAN EXPRESSION CARDS 

A boolean expression is punched on one or more cards 
and put behind a ^SCREEN card in the input deck. As many 
cards as needed may be used, with the expression beginning 
in column one of the first card and continuing in column 
six of the others. Columns IDCOL to Column 80 cannot be 
used on any of the cards as they are ignored by the program. 
The expression can be broken anywhere except in the middle 
of a character string, operator, or an entry name and the 
program will automatically continue the expression properly. 
Blanks, except where they appear within single quotes, are 
not needed and they are ignored by the program, 

When a boolean expression is being evaluated for a data 
record, the scan proceeds from left to right, consequently, 
some savings in computer time may be realized if, when 
constructing the screening expression, those elements which 
are most likely to be true in an OR subexpression or those 
most likely to be false in an AND subexpression are placed 
as far to the left as possible. 

The only restrictions on the size of the screening ex- 
pressions are: (a) the total length of the expression cannot 

exceed 3000 characters, (b) the maximum length of a character 
string (between single quotes) is 25 characters, and (c) the 
maximum number of character strings in the expression is 50. 
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VII. OTHER CONTROL CARDS 



Thus far, the uses of five control cards have been 
explained. These were the *SETUP, *LIST, *DATA, *FORMAT, 
and ^SCREEN, which essentially are required for the basic 
operation of the system. There are eight other control 
cards which may be included in an input deck also, to 
accomplish such things as printing formatted output, 
printing headings, changing the value of program variables, 
printing the elapsed computer time, reordering the data 
file, and signalling the end of a job. 

A. *CHANGE (CHANGE VARIABLE'S VALUE) 

When this control card is encountered in the input deck 
the program executes a GET DATA statement which will read 
the next card in the input deck and all subsequent cards 
until the variable list is completed (marked by the appear- 
ance of a semicolon). The execution of this statement 
will change the value of any program variable whose name 
appears in the variable list to the value given. However, 
it is strongly recommended that the use of this facility 
be limited to five variables. Four of these were described 
in the *SETUP section, they are: COUNTER, IDCOL, COLOR, 

and SIZE. The fifth variable is SLASH. 

SLASH is a special program variable and as such is 
reserved for program use only. This variable is initialized 
to a '/' when the program is loaded. If the user has a 



great need to use the character ' /', then he may change the 
value of SLASH to any other character which will then, in 
turn, be reserved from future use until changed again or 
the program is terminated. This new character would then 
be used in lieu of the slash wherever described in this 
paper. To accomplish the desired changes, one or more 
cards must follow the ^CHANGE card with statements of 
the form "variable = value." Each of these statements is 
separated from the next by a comma, and cannot be split 
between cards. All 80 columns may be used and blanks are 
ignored, the only other requirement is that the variable 
list be terminated by a semicolon. All numbers assigned 
as values must be integers and any character assigned to 
slash must be enclosed in single quotes. 

B. *ORDER (REORDER DATA RECORDS) 

Whenever the *DATA control card is encountered in the 
input deck and the subsequent data file processed, the 
records in this data file are ordered sequentially as they 
are read in, and it is in this order that the data records 
are processed and printed whenever so indicated. This 
order may be changed at any time by the user, and this 
new order remains in effect until another change is called 
for or until termination of the program. To change this 
order, a card with *ORDER punched in columns 1-6, followed 
by two or more blanks, followed by the word "BY," followed 
by zero or more blanks, followed by an entry name (as 
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specified in the *SETUP section), is inserted in the input 
deck at the point in the processing at which reordering 
is desired. This ordering procedure reorders the records 
based on their relative position in the collating sequence, 
from the lowest to the highest. Null entries are placed 
at the end of the new order, in the order in which they 
appear in the input deck. For example, BORDER BY TITLE 
would reorder the data file alphabetically by titles. 

C. *PRINT (PRINT OUTPUT) 

When this control card is encountered in the input deck 
it causes some information from the data file to be printed. 
If the records are to be screened (according to the most 
recent criteria) for inclusion in the output then the word 
"SCREEN" must appear somewhere on the *PRINT control card. 

If screening is not be performed and all records are to be 
included in the output, then the word "SCREEN" must not 
appear on the *PRINT control card. 

The order in which the records are printed is specified 
by the user. If an *ORDER control card has not appeared 
previously in the input deck, then the records will be 
processed in the same order that they appeared in the data 
deck . 

When a record is processed for printing, the most recent 
format specification is used. There are no default format 
specifications. If printing is requested before a ^FORMAT 
control card has been processed, an error condition will 
be raised. 
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D. *EOF (END OF FILE) 



U 

c 

The #EOF control card is only used in two different 
situations in the input deck. It is used to mark the end 
of the data file on input and to mark the end of the format 
specifications. Thus a card with #EOF punched in columns 
1-iJ should be the last card in the *DATA and the *FORMAT 
sections of the input deck. 

E. #ENDJOB (END OF JOB) 

This card is most conveniently placed at the end of an 
input deck, although it may be used at the ends of complete 
sections within an input deck. It merely causes the message 
"END OF JOB" to be printed at the top of the next page in 
the output, and does not actually cause termination of the 
program. 

The remainder of the control cards have one requirement 
in common, and it will be described at this point instead of 
separately for each control card. These control cards when 
encountered in the input deck cause some printing to take 
place. Thus, the one thing they have in common is the card 
which is to provide the necessary information to format the 
output. This card is of the following form: COLUMNS = X, 

LINES = Y, PAGES = Z;. Where X is the number of the column 
in which the specified printing is to begin, Y is the number 
of lines that are to be skipped down the page and on which 
the printing is to begin, and Z is the number of pages that 
are to be ejected before printing is to begin. This card 
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may be punched in a free format as long as a comma separates 
the items in the list and a semicolon marks the end of the 
■list. All three of these variables have a default value of 
■one, thus they need only be specified when a different value 
is desired. If all default values are desired, a card must 
then be inserted with a semicolon punched anywhere in 
columns 1 to 80 . 

F. ^HEADING (PRINT HEADING) 

The ^HEADING card provides the user with the capability 
of inserting formatted text, which is independent of the 
data being processed, in the output stream. Three cards 
are required to -print a heading, the *HEADING card followed 
by the formatting card described above, followed by a card 
with the text to be printed, punched starting in column one. 
All 80 columns of this card are printed. 

G. *PRINTCOUNTER (PRINT "COUNTER") 

This control card is associated with the program vari- 
able COUNTER. As mentioned earlier, COUNTER counts the data 
records that are passed to the printing section of the 
program. When *PRINTCOUNTER is encountered in the input 
stream, it immediately reads the next card which contains 
the formatting information described above. The text is 
taken from columns 15 to 80 of the *PRINTCOUNTER card and is 
concatenated with the value of COUNTER. This string is 
printed according to the format specified. Then COUNTER 
is reset to zero. 
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H. *TIME (PRINT ELAPSED TIME) 

This control card enables the user to determine the 
computer time taken to perform various tasks with GIPSY. 
When it is encountered, the elapsed time, in seconds, since 
the previous *TIME card was processed is printed, followed 
by the contents of columns 15 to 80 of this card. A second 
card must follow the *TIME card with the formatting infor- 
mation described above. For the first # TIME card in the 
deck, the elapsed time is computed from the time the progam 
was loaded. 

For these elapsed time computations, the current time 
of day is used. Thus, unfortunately, when GIPSY is being 
run in an MVT environment, the elapsed time would be a 
total elapsed time , including OS/36O interrupts, and not 
just the amount of CPU time required by GIPSY. 
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VII. SYSTEM IMPLEMENTATION 



Basically GIPSY is organized into separate sections of 
computer code, each of which performs a different informa- 
tion processing task. This program organization, made 
possible by the ability to subscript labels in PL/I, was 
chosen over a "procedure" or "subroutine" orientation in 
that it would be faster, while requiring approximately the 
same amount of coding. Entry into (and departure from) a 
segement of code is caused by the occurrence of a control 
card in the input stream. As a consequence, an input deck 
to GIPSY is divided into sections according to the tasks 
to be performed, with each section preceded by a control 
card . 

The names of the control carrds that are associated 
with eleven such sections of the program are placed in 
the array, CONTROL(i). When a control card is encountered, 
its name is looked up in CONTROL(i) and this subscript 
is then used to provide the proper subscript for the label 
variable SEGMENT(i) when a GOTO SEGMENT(i) statement is 
executed. The control card *SETUP is handled a little 
differently in that instead of returning its position in 
the array CONTROL, a statement GOTO SETUPS is executed, the 
reason for this variation will become clear later. 

A. SETTING UP THE PROGRAM 

The first executable section of code in the program 
fetches the current time of day, converts it from a 
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character variable which gives the time in hours, minutes, 
seconds and milliseconds into a fixed binary variable 
which gives the time in milliseconds, and then stores this 
value in a variable named CLOCK. This would be used later, 
the first time the *TIME control card is processed. This 
is followed immediately be a section that processes the 
*LIST control card if it is present as the very first card 
of the input deck (after the JCL cards). 

The *LIST section is very simple in that it reads a 
card from the input stream and immediately outputs this 
same card, centering it on the output page. It starts 
printing the card images on line 13 of the page and then 
prints 46 such images, ejects a page, prints another page, 
and so on until the input stream is emptied. At this point 
control is transferred to the end of the program and term- 
ination occurs normally. 

If on the other hand, the first card encountered in 
the input deck is the *SETUP control card the program 
begins setting up to process the input deck. The first 
thing GIPSY does is to assign default values to several 
variables. A GET DATA statement is then executed to read 
in the next card, and any continuation cards, obtaining 
the user supplied values. 

At this point in the execution of the program, utiliz- 
ing the quantities just obtained (or their default values), 
calculations are made to determine the amount of storage 
to request from OS/360 and to determine the dimensions of 
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19 subscripted variables. Next the internally nested 
procedure labeled START Is called and these 19 variables 
are declared utilizing the results of the calculations 
just described. The method just described makes use of 
the capability in PL/I to pass the values of variables 
to subordinate procedures for dimensioning the locally 
declared variables, to allocate storage dynamically. In 
this manner the storage allocated more closely approxi- 
mates the needs of the application and thus is more effi- 
cient than a blind, fixed allocation. 

Before the last bit of setting up is done, a few vari- 
ables are initialized. Finally then, the procedure PACKHOLD, 
which extracts character strings from cards, packing them 
into the variable HOLD, is called to read in the remaining 
data cards in the *SETUP section. The variable HOLD now 
contains all the entry names, separated by slashes. The 
program then processes HOLD, picking off the entry names 
and placing them in the array, ATTRIBUTES ( i ) . ( ATTRIBUTES(i) 

and CONTROL(i) are the only two tables which ever require a 
lookup during the execution of the entire program) . 

After the last card naming the entries has been read, 
the next card in the input deck is read and a look up for a 
control card is conducted, if not found, a diagnostic is 
printed and the program execution terminated; otherwise a 
branch is made to that section of the program which processes 
the control card found. 
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B. PROCESSING THE DATA FILE 



The next control card normally found at this point is 
the *DATA card. This card marks the beginning of the user's 
data file that is to be stored at this point and then man- 
ipulated throughout the remainder of the execution of GIPSY. 
This data base is stored in an array of sequentially ordered 
lists named WORD(i). It will facilitate understanding of 
the remainder of this section if the reader will refer to 
Figure 8 freely. WORD is a character variable with a 
length attribute of 32767 characters, varying. This is 
obviously one of the variables whose dimension is calculated 
with user supplied information. 

When PL/I allocates storage for a variable with a vary- 
ing length attribute it has no choice but to reserve space 
for its maximum length. Therefore, storage is allocated 
for the data file in chunks of 32K bytes of core. The 
calculation of the dimension of WORD (how many 32K chunks 
the user needs) is done as follows: ' 

STORAGE = CARDS * COLS: 

L = (STORAGE/ 32767) + lj 

WORD is then given the dimension L each time the Procedure 
START is entered as the result of the occurrence of a 
*SETUP control card. 

TABLE(i,j) is another important variable necessary for 
the storage of the data file which is dimensioned with user 
supplied quantities. In the declaration statement 
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i = ESTIMATE and j = ENTITY + 2 (entity is the number of 
entries in a data record) . The rows of TABLE correspond 
to data records, and the columns correspond to entries. 

To input and store the data file the PACKHOLD procedure 
is called which obtains the first data record, packs it 
into HOLD, and then returns HOLD. The program then places 
the record (without the slashes), character by character 
into contiguous character positions of WORD. As it does 
this, it also places pointers to the beginning character 
position for each entry into TABLE(i,j). 

After the record is processed, the data found in the 
IDENT field is placed in the array SERIALS(i), the pointer 
to the next available character position in WORD is placed 
in TABLE(i, ENTITY+1), the value of the current subscript 
of WORD is placed in TABLE(i, ENTITY+2), and then i (the 
row number) is incremented by one. This process is repeated 
over and over until the *EOF control card is encountered 
in the input stream. 

C. THE FORMAT SPECIFICATIONS 

To process the format specifications requires 19 one- 
dimensional arrays. These arrays fall into two groups, 
based on how their subscripts are derived. In one group, 
the subscript is the number of the logical line that the 
particular value is associated with. This group can be 
thought of as making up one large format table, where the 
rows correspond to the logical lines and the columns are 
the various one-dimensional arrays. In the other group. 
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the subscript is simply the number of the occurrence of 
that particular item since the beginning of the format 
specifications. The reader will find it helpful to refer 
to Figure 9 while reading the remainder of this section. 

The reason that so many arrays are required is that 
the format specifications are preprocessed to the maximum 
extent that is feasible, the philosophy being that the 
extra time and space expended at this point greatly enhances 
overall program efficiency since this processing is done 
only once and each format specification is processed many, 
many times during the output operations. This section of 
the program is enclosed in one large loop that is entered 
when the ^FORMAT control card is encountered. It processes 
the format specifications logical line by logical line until 
the *EOF control card is encountered and the loop is exited. 

To process the specifications for a logical line the 
program first takes the two form cards, and stores columns 
2-80 of the first and 1-53 of the second as a character 
string variable of length 132 in the format table. Next 
GIPSY scans across this string looking for slashes that 
indicate print positions. When one is found, its column 
number is placed in the array POSITION(k) (from the second 
group described above), the slash is replaced by a blank, 
the subscript incremented, and the scan then continues in 
this fashion until all 132 print positions have been 
examined . 
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THE FORMAT TABLES 
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Uext GIPSY looks at the fields in columns 5^ to 80 of 
the second form card and extracts the values and places 
them in the appropriate arrays in the format table. At 
this point the field in column 55 (NOCONTENT) is checked 
for a slash, if one is present the remainder of this section 
is bypassed, and processing of the next logical line is 
begun. If, however, a content card is present the PACKHOLD 
procedure is used to read and pack all the content cards 
into HOLD. 

Throughout the rest of this section HOLD is constantly 
being picked apart, the information is processed and placed 
in arrays. Whenever anything is extracted it is replaced 
by a symbol so that when processing is completed what 
remains is a symbolic representation of the content card. 
This string is then placed in the format table along with 
all the information it represents. 

As HOLD passes through this preprocessing section of 
coding the first items extracted are all the conditional 
expressions in that logical line and each is replaced in 
HOLD by the character Each conditional expression is 

decomposed and its pieces placed in a table where the sub- 
script corresponds to the number of the occurrence of con- 
ditional expressions since the beginning of the format 
specifications . 

The data to the left of the slash is further decomposed 
in that the variables are replaced by the * | ' character. 
These variables' numbers are placed in an array VARNO(i) 
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and the number of variables in the expression is placed in 
the array NOVAR(i). The remaining character string is 
placed in the array INSERT(i). The variable to the right 
of the slash is looked up, and its number, prefixed with 
the sign that was already there, is placed in the array 
CONDITION ( i ) . 

After all the conditional expressions have been 
processed, GIPSY scans HOLD looking for variables and when- 
ever one is encountered it is replaced in HOLD by the 
character ' | ' and its number is placed in the table in array 
VARIABLE(i) . 

As the last step in preprocessing, the logical line 
HOLD is scanned looking for fields that have been concaten- 
ated (indicated by double slashes) to form a single new 
field. The program then uses the angular brackets to 
delimit this new field. Once HOLD has been completely 
processed, its contents are placed in the format table in 
the array CONTENT(i), and the program loops back to commence 
processing the next logical line. 

D. TH]| SCREENING EXPRESSION 

The section of the program that processes the ^SCREEN 
control card is without a question the most interesting. 

Here too, the "preprocessing" philosophy prevailed. This 
section also makes use of a table. The screen table is 
made up of 5 one-dimensional arrays, each corresponding to 
a column of the table. The rows in the table correspond 
to each 'character string', and in the same sequence that 
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they occur. Frequent referral to Figure 10 will aid In 
making the following paragraphs clearer. The first step 
Is to use the PACKHOLD procedure to obtain the boolean 
expression that describes the screening criteria. HOLD 
Is processed from left to right picking off the symbols 
one at a time. 

1 . The Screen Table 

As each symbol is picked off It Is examined for 
certain characteristics which indicate to the program 
how it Is to be processed. If the symbol is a blank 
then it is discarded, if it is a right or left parenthesis 
it is left as is, and the next symbol is extracted. The 
key to the procedure is a symbol enclosed in single quotes, 
this is a literal character string that is going to be 
looked for later in the data base. When one is encountered, 
the quotes are stripped off and it is placed in the screen 
table in the array SEEK(i), it is replaced in HOLD by a 
slash, and then i, the row number in the table, is increased 
by one. Consequently, the rows in the screen table corres- 
pond to each of these character strings in the order that 
they are encountered. 

If the symbol is one of the eight relational oper- 
ators it is removed from HOLD and placed in the table in 
the array LOCATOR(i), the program then looks backward in 
LOCATOR and if any of the previous positions are blank 
they are filled with this same operator. Similarly, if the 



64 



PROCESSING A SCREEN 



a. ORIGINAL EXPRESSION : 

HOLD = ('SIM PROG' OR 'LIST PROC') AND 'LANG COMP' IN 

DESCRIPTORS AND ('1964' OR '1965' OR '1966' IN DATE) 
AND NOT ('GEH' OR 'YCC') AT LOCATION 

b. ABBREVIATED FORM : 

HOLD = (/0/)A/A(/0/0/)AN(/0/) 

c. COMPLETELY PARENTHESIZED FORM : 

SEEKER = (((((/0/)A/)A(((/0/)0/)))A((/0/)N))) 
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symbol Is an entry name, it Is removed from HOLD, looked up 
in the entry name table, and its entry number is placed in 
the screen table in the array ENTRY (i), the program then 
works backward in ENTRY placing this number in all unfilled 
positions. At this point it can be seen that to each row 
in the screen table there corresponds a slash in HOLD, and 
this slash (and each row in the table) represents a simple 
boolean expression as defined in Section VI-A. Finally, if 
the symbol is a boolean operator AND, OR or NOT, all char- 
acters but the first one of the operator name are removed 
from HOLD. 

After HOLD has been processed in this manner and is 
in abbreviated form, the expression is completely parenthe- 
sized according to the normal hierarchy of boolean opera- 
tions. To do this requires three passes through HOLD from 
left to right. On the first pass each N (NOT) operator and 
its associated operand are enclosed in parentheses. Since 
NOT is a unary operator, the operator and the operand are 
exchanged in position to provide consistancy for a routine 
that is used later. On the second pass the A (AND) operator 
and both of its operands are enclosed in parentheses. Then, 
0 (OR) operators and both their operands are parenthesized 
on the last pass. Throughout these three passes, previous 
parenthesization is taken into account in determining the 
operands which are to be parenthesized at the current step. 
All this done, the expression is then itself enclosed in 
parentheses and stored in a variable named SEEKER. 
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2 . Expression "Pre-Evaluation" 



The final phase in the processing of a screening 
expression utilizes a few shortcuts from Boolean algebra to 
"pre-e valuate " the boolean expression. The algorithm which 
does this starts with the first character in SEEKER and 
moves to the right one character at a time until SEEKER has 
been completely processed. 

As the algorithm moves across SEEKER looking at each 
character, it keeps track of the parenthesis level with the 
variable M. When it encounters a left paren, M is incre- 
mented by one and when it encounters a right paren, M is 
decremented by one. If in processing SEEKER the character 
being looked at is a boolean operator, it is simply ignored 
and the next character is examined. 

When a slash (a simple boolean expression) is 
encountered the procedure NEXTLINE is called twice, once 
with the value "false" supplied and once wi.th the value 
"true" supplied. Roughly speaking NEXTLINE looks at the 
remainder of the screening criteria and says that if in 
the evaluation of the screen this simple boolean expression 
is true (or false) then the next simple boolean expression 
that must be evaluated is "i", the row number in the screen- 
ing table. 

NEXTLINE 's operation is centered around the fact 
that in the logical expression (X OR Y), if the value of 
X is true then there is no need to evaluate Y, since its 
value has no effect on the overall value of the expression. 
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Only If X Is false must Y be evaluated. The same is true in 
the expression (X AND Y) if the value of X if false. Only 
if X is true must Y be evaluated. 

NEXTLINE records the current parenthesis level and 
proceeds across SEEKER adjusting the parenthesis level until 
it comes to the boolean operator that is associated with 
the slash that initiated the procedure call. Depending on 
which operator it is and also on the value supplied, the 
procedure then locates the next simple boolean expression 
that must be evaluated. This expression's row number in 
the screen table is coded and then returned. 

The first call to NEXTLINE supplies the value 
"false," the coded number that NEXTLINE returns is then 
placed in the screen table in the array PALS(i). The second 
time NEXTLINE is called the value "true" is supplied and 
the value returned is placed in the table in the array 
TRU(i). The reason that the value returned by NEXTLINE is 
coded is that for each call there are three possible 
evaluation results: (1) there are no more simple boolean 

expressions to be evaluated and the final value of the 
screen is "true," (2) there are no more simple boolean 
expressions to be evaluated and final value of the screen 
is "false," and (3) row "i" in the screen table must be 
processed next. 

The end result of this preprocessing of the screen- 
ing criteria is that later in the program when a data record 
is being screened for inclusion in the output, the program 
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enters the screen table at row one and evaluates this first 
simple boolean expression. Depending on the outcome of 
this evaluation it looks in the appropriate column, either 
TRU(i) or FALS(i) in the screen table to see what to do 
next. If the number found there is positive, it enters 
the table again at this indicated row. If the number is 
negative, then it knows that no more evaluations are 
necessary and that the final value of the screen expression 
is either true or false depending on whether the number 
there is -1 or -2. 

E. THE REORDERING PROCEDURE 

When the #SETUP control card is processed the array 
ORDER(i) is initialized by a DO LOOP such that ORDER(i) =i. 
In the section of the program that screens and processes 
the data records for output, the data records are accessed 
by row number in TABLE(i,j) through the contents of ORDER, 
i.e., TABLE(ORDER(i) , J ) . 

The first step in processing the #ORDER control card is 
to pick off the entry name and look its number up in the 
list of entry names. This number then becomes the column 
number, j, in TABLE(i,j). 

The reordering process discussed in Section VII-B then 
becomes a matter of rearranging the contents of ORDER. The 
algorithm which does this sorting is efficient, in that in 
the worst case, the original order completely reversed, it 
only requires N passes through the list, where N is the 
number of items in the list. 
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The algorithm wanders through the proper column of TABLE 
fetching the entries by row number based on the contents of 
ORDER. It compares the first two members of the list, if 
the first member is larger than the second then their 
positions are exchanged so that member 1 is now member 2 
and vise versa. If they are already in proper order nothing 
is done. Next members 2 and 3 are compared and if member 
2 is larger than member 3 then their positions are exchanged. 
If they are already in the proper order nothing is done and 
the next two members are compared. This process is contin- 
ued until the last two members of the list have been 
compared. 

At this point a flag is tested. to see if there were no 
exchanges made during this pass through the list. If such 
is the case, the list is in proper order and the procedure 
halts. If an exchange was made, the procedure loops back 
through the list and the process is repeated. 

F. OUTPUT PROCESSING 

Certainly the most complex section of GIPSY is the one 
that screens and outputs the data records. When this sec- 
tion of coding is entered the *PRINT control card is 
checked for the presence of the character string 'SCREEN'. 

If it is found, a flag is set to indicate to the program 
that the data records are to be compared against the most 
recent screening criteria and only those which satisfy the 
criteria are to be printed. 
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A few variables are then initialized, one of which is Q 
(initialized to one). Q is used as a row marker in the format 
table. The program enters the format table at row Q and 
checks TITLEPAGE(Q) for the presence of a slash. If a 
slash is found, the program goes into a loop which prints 
FORM(Q) according to the dictates of the rest of line Q 
in the table and then increments Q by one. The loop is 
exited the first time that TITLEPAGE(Q) equals a blank. 

At this point another row marker for the format table, 

V, is set equal to Q. V serves two purposes; it acts as 
a flag which indicates the presence of page headings in 
the format specifications, and it marks the beginning line 
number in the format table of these page headings. If 
PAGEHEAD(Q) contains a slash then the program goes into 
a loop which prints FORM(Q) according to the dictates of 
the rest of line Q in the format table, and then Q is 
incremented by one. The loop is exited the first time that 
PAGEHEAD(Q) equals a blank. 

Q is now the number of the first line in the format 
table which represents the first logical line that is to 
be processed for each data record selected. Rows V to Q 
in the format table are the format specifications to be 
processed each time that a new page is ejected. 

With these administrative details out of the way GIPSY 
is ready to retrieve and process the data records. The 
remainder of this section in the program is contained in a 
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loop that is traversed once for each record in the data 
file. It is here that the variable ORDER(i) is used to 
retrieve the data records. The contents of ORDER(i) are 
accessed sequentially beginning with i equal to one. 

1 . Screening the Records 

The program looks at the screen flag, if it is not 
set then the next group of instructions are skipped and 
all records are processed and printed. If the screen 
flag is set GIPSY starts with row one in the screen table 
and fetches SEEK(l), LOCATOR(l), and ENTRY ( 1 ) . Then the 
entry pointed to by TABLE(0RDER(1 ) , ENTRY(l)) is fetched, 
and compared with SEEK(l) according to the dictates of 
LOCATOR(l). If the result of this comparison is "true" 
then TRU(l) is stored in M, if the result is false then 
FALS(l) is stored in M. 

The program then examines M, if it is greater than 
zero then further screening is required and the entry 
pointed to by TABLE (ORDER( 1 ) , ENTRY (M)) is fetched and 
compared with SEEK(M) according to the dictates of 
LOCATOR(M), and so on until the value in M is negative. 
This indicates that no further screening is required. If 
M is a -1 then the record satisfied the screening criteria 
and is passed on to be printed. If M is a -2 then the 
record failed to satisfy the screening criteria and the 
rest of this section is bypassed, and the next record is 
obtained . 
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2 . Formatting the Output 



A data record reaches this point in output process- 
ing in one of two ways, it satisfactorily passed the screen 
or it was not required to be screened. The variable 
COUNTER is incremented by one and a loop is entered that 
starts at row Q in the format table, and utilizing the 
current data record, cycles through the rest of the table 
printing the specified information. The first step in the 
loop is to check the logical line to see if there is a 
slash in NOCONTENT(i ) . If one is present the program 
skips the next group of instructions which are concerned 
with processing the content of the line. Otherwise 
CONTENT(i) is fetched from the format table and is processed. 

In processing a logical line for printing, the 
contents of FORM(i) are saved in a temporary variable for 
over printing purposes, and CONTENT(i) is processed 
character by character. HOLD, a character variable of 
length 3000, is used to hold the logical line as it is 
being assembled for printing. 

First the character string from position 1 to 
P0SITI0N(k)-l in FORM(i) is placed in HOLD. This character 
string contains any associated text that was on the format 
cards. Next the first field from CONTENT(i) is processed, 
placing literals, appropriate data record entries or 
variables, and/or conditional information in HOLD, 
beginning in position POSITION(k). Then the next piece of 
FORM(i) is placed in HOLD and the next field processed 
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and placed in HOLD, and so on until the whole logical 
line, fully processed, is in HOLD. 

In processing the individual field from CONTENT(i) 
the possibility of the field being a composite of fields 
and the possibility of conditional expressions being 
included, must be taken into account. If a conditional 
expression is present then condition CONDITION(j) must 
be checked. If the condition is not satisfied then the 
expression is ignored and processing of CONTENT(i) is 
resumed. If the condition is satisfied then INSERT(j) is 
processed as if it were a subfield. If the field is a 
composite of fields then all of the individual fields are 
processed as subfields. All the sub fields processed, they 
are then placed in HOLD as if they were one field. 

If in processing any field or subfield, an entry 
to be inserted turns out to be the null entry, the program 
checks SUPPRESS(i) for the presence of a slash. If it is 
blank the program simply treats the null string as a valid 
entry and continues processing in the normal manner. If, 
however, it is a slash then the program eliminates all 
literals associated with that entry. If the whole field 
then turns out to be null, all associated text is also 
eliminated. This results in the null string being placed 
in HOLD for the corresponding print position, POSITION(k). 

3 • Printing the Output 

The program is now ready to begin printing. HOLD 
contains the logical line, and FORM(i) contains the text 



to be overprinted. The logical line must be checked to 
see if it is longer than a physical line, in other words 
the length of HOLD is checked against STOPCOL(i). If it 
is longer, then the program takes HOLD at the STOPCOL(i) 
position and works backward until it comes to the first 
blank. The first physical line is then taken up to that 
point. This character string is padded on the right with 
blanks to fill out to 132 characters and the result is 
placed in an array named OUTLINE(n). A loop is then entered 
that takes the next available OUTLINE(n) and fills in blanks 
from position 1 to position IDENT(i)-l. The program takes 
the next STOPCOL(i) -IDENT(i) characters of HOLD, finds 
the end of the last word, pads on the right with blanks, 
places the result in the current OUTLINE(n), and then 
increments N. The loop is exited when the logical line is 
in a form suitable for printing. 

The program now looks to see at what point on the 
page the last logical line completed printing. It compares 
this with the number of lines that are required to print 
the current logical line and if this sum is greater that 
the value of LASTLINE(i), the page number is printed (if 
specified) and a new page is ejected. 

At this point, the number of lines indicated by 
TOPLINE(i) are skipped and OVERPRINT(i) is checked for a 
slash. If a slash is present then PORM(i) is printed on 
the same line COLOR-1 times and then OUTLINE(l) is printed 
on this same line. If a slash is not present then just 
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OUTLINE(l) is printed. If required, the program enters a 
loop that skips the number of lines indicated by the value 
of SKIPLINES (i) and prints OUTLINE(i) and then increments 
n. The loop is exited when the logical line is completely 

printed . 

This point in the program marks the end of two 
loops, the loop that is processing the logical lines and 
the loop that is processing the data records. When the 
last logical line associated with the last data record 
has been processed, control is transferred to the section 
of the program that is to process the next control card. 

G. ELAPSED TIME PROCESSING 

When the *TIME control card is encountered the current 
time of day is obtained immediately. Next, the text to be 
printed and its attendent formatting information is obtained. 
The time of day is returned as a character string, conse- 
quently, the various substrings that represent hours, 
minutes, etc., must be converted to fixed binary numbers in 
order that arithmetic operations may be performed. All 
these quantities are converted to milliseconds and summed. 
From this quantity is subtracted the value of CLOCK (des- 
cribed in Section VIII_A) and the result converted to 
seconds. CLOCK is updated so that it now contains the time 
of day, in milliseconds, of the most recent processing of 
the *TIME control card. Finally, the elapsed time, in 
seconds, followed by the text is printed according to the 
format specified. 
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IX. CONCLUDING REMARKS 



There are several modifications and extensions that 
could be made to the program which would extend the scope 
of applicability and the usefulness of GIPSY. The first 
such modification would be to adapt it to operate in a 
time sharing mode instead of its present batch mode. An 
efficient algorithm could be included that would provide 
for swapping WORD(i) vectors between core and a disk, such 
that only one vector be in core at any one time. This 
would reduce the core storage requirements of GIPSY. Also, 
it would be relatively easy to add to GIPSY the capability 
of adding, deleting, and/or modifying the data file. A 
natural language preprocessor that would make it easier to 
specify the different tasks that are to be performed 
could be added, too. As more experience with using GIPSY 
is accumulated, other improvements will probably suggest 
themselves, also. 
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ENERAL INFORMATION PROCESSING SYSTEM 
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